Improving Unsegmented Dialogue Turns Annotation with N-gram Transducers
نویسندگان
چکیده
The statistical models used for dialogue systems need annotated data (dialogues) to infer their statistical parameters. Dialogues are usually annotated in terms of Dialogue Acts (DA). The annotation problem can be attacked with statistical models, that avoid annotating the dialogues from scratch. Most previous works on automatic statistical annotation assume that the dialogue turns are segmented into the corresponding meaningful units. However, this segmentation is not usually available. Most recent works tried the annotation with unsegmented turns using an extension of the models used in the segmented case, but they showed a dramatical decrease in their performance. In this work we propose an enhanced annotation technique based on N-gram transducers that outperforms the accuracy of the classical HMM-based model for annotation and segmentation of unsegmented turns.
منابع مشابه
Segmented and Unsegmented Dialogue-Act Annotation with Statistical Dialogue Models
Dialogue systems are one of the most challenging applications of Natural Language Processing. In recent years, some statistical dialogue models have been proposed to cope with the dialogue problem. The evaluation of these models is usually performed by using them as annotation models. Many of the works on annotation use information such as the complete sequence of dialogue turns or the correct ...
متن کاملEvaluation of HMM-based Models for the Annotation of Unsegmented Dialogue Turns
Corpus-based dialogue systems rely on statistical models, whose parameters are inferred from annotated dialogues. The dialogues are usually annotated using Dialogue Acts (DA), and the manual annotation is difficult and time-consuming. Therefore, several semiautomatic annotation processes have been proposed to speed-up the process. The standard annotation model is based on Hidden Markov Models (...
متن کاملImproving Unsegmented Statistical Dialogue Act Labelling
An important part of a dialogue system is the correct labelling of turns with dialogue-related meaning. This meaning is usually represented by dialogue acts, which give the system semantic information about user intentions. Each dialogue act gives the semantic of a segment of a turn, which can be formed by several segments. Probabilistic models that perform dialogue act labelling can be used on...
متن کاملUsing Bigrams to Identify Relationships Between Student Certainness States and Tutor Responses in a Spoken Dialogue Corpus
We use n-gram techniques to identify dependencies between student affective states of certainty and subsequent tutor dialogue acts, in an annotated corpus of human-human spoken tutoring dialogues. We first represent our dialogues as bigrams of annotated student and tutor turns. We next use χ analysis to identify dependent bigrams. Our results show dependencies between many student states and su...
متن کاملParallel Corpora Segmentation Using Anchor Words*
A new technique for monotone segmentation of parallel corpora is introduced. This segmentation is based on a set of anchor words which are defined manually. The parallel segments are computed using a dynamic programming algorithm. To assess this technique, finite-state transducers are inferred from both non-segmented and segmented corpora. Experiments have been carried out with Spanish-English ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009